(Big)Data in a Virtualized World: Volume, Velocity, and Variety in Enterprise Datacenters

نویسندگان

  • Robert Birke
  • Mathias Björkqvist
  • Lydia Y. Chen
  • Evgenia Smirni
  • Ton Engbersen
چکیده

Virtualization is the ubiquitous way to provide computation and storage services to datacenter end-users. Guaranteeing sufficient data storage and efficient data access is central to all datacenter operations, yet little is known of the effects of virtualization on storage workloads. In this study, we collect and analyze field data from production datacenters that operate within the private cloud paradigm, during a period of three years. The datacenters of our study consist of 8,000 physical boxes, hosting over 90,000 VMs, which in turn use over 22 PB of storage. Storage data is analyzed from the perspectives of volume, velocity, and variety of storage demands on virtual machines and of their dependency on other resources. In addition to the growth rate and churn rate of allocated and used storage volume, the trace data illustrates the impact of virtualization and consolidation on the velocity of IO reads and writes, including IO deduplication ratios and peak load analysis of co-located VMs. We focus on a variety of applications which are roughly classified as app, web, database, file, mail, and print, and correlate their storage and IO demands with CPU, memory, and network usage. This study provides critical storage workload characterization by showing usage trends and how application types create storage traffic in large datacenters.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

(Big)data in a virtualized world: volume, velocity, and variety in cloud datacenters

Virtualization is the ubiquitous way to provide computation and storage services to datacenter end-users. Guaranteeing sufficient data storage and efficient data access is central to all datacenter operations, yet little is known of the effects of virtualization on storage workloads. In this study, we collect and analyze field data from production datacenters that operate within the private clo...

متن کامل

Energy-Efficient Big Data Analytics in Datacenters

­The volume of generated data increases by the rapid growth of Internet of Things (IoT), leading to the big data proliferation and more opportunities for data centers. Highly virtualized cloud-based datacenters are currently considered for big data analytics. However big data requires datacenters with promoted infrastructure capable of undertaking more responsibilities for handling and analyzin...

متن کامل

Perspectives of Big Data Quality in Smart Service Ecosystems (Quality of Design and Quality of Conformance)

Despite the increasing importance of data and information quality, current research related to Big Data quality is still limited. It is particularly unknown how to apply previous data quality models to Big Data. In this paper we review Big Data quality research from several perspectives and apply a known quality model with its elements of conformance to specification and design in the context o...

متن کامل

Spatial Big Data: Case Studies on Volume, Velocity, and Variety

Increasingly, the size, variety, and update rate of spatial datasets exceed the capacity of commonly used spatial computing and spatial database technologies to learn, manage, and process data with reasonable effort. We believe that this data, which we call Spatial Big Data (SBD), represents the next frontier in spatial computing. Examples of emerging SBD include temporally detailed roadmaps th...

متن کامل

BDGS: A Scalable Big Data Generator Suite in Big Data Benchmarking

The complexity and diversity of big data systems and their rapid evolution give rise to various new challenges about how we design benchmarks in order to test such systems efficiently and successfully. Data generation is a key issue in big data benchmarking that aims to generate application-specific data sets to meet the 4V requirements of big data (i.e. volume, velocity, variety, and veracity)...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014